Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks
نویسندگان
چکیده
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by adversarial examples that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, feature squeezing, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
منابع مشابه
Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples
Feature squeezing is a recently-introduced framework for mitigating and detecting adversarial examples. In previous work, we showed that it is effective against several earlier methods for generating adversarial examples. In this short note, we report on recent results showing that simple feature squeezing techniques also make deep learning models significantly more robust against the Carlini/W...
متن کاملFeature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
Deep Neural Networks (DNNs) have achieved remarkable performance in a myriad of realistic applications. However, recent studies show that welltrained DNNs can be easily misled by adversarial examples (AE) – the maliciously crafted inputs by introducing small and imperceptible input perturbations. Existing mitigation solutions, such as adversarial training and defensive distillation, suffer from...
متن کاملLess is More: Culling the Training Set to Improve Robustness of Deep Neural Networks
Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either improving the network architecture or adding adversarial examples into the training set, with their respective limitations. We propose a new direction. Motivated by recent research that shows that outliers in the training set have a high negative influence on the tra...
متن کاملAdversarial Phenomenon in the Eyes of Bayesian Deep Learning
Deep Learning models are vulnerable to adversarial examples, i.e. images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. ...
متن کاملEAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on L2 and L∞ distortion metrics. However, despite the fact that L1 distortion accounts for the total variation and e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.01155 شماره
صفحات -
تاریخ انتشار 2017